{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# pandas 进阶修炼 ｜早起Python\n",
    "<br>\n",
    "\n",
    "**本习题由公众号【早起Python & 可视化图鉴】 原创，转载及其他形式合作请与我们联系（微信号`sshs321`)，未经授权严禁搬运及二次创作，侵权必究！**\n",
    "\n",
    "\n",
    "\n",
    "本习题基于 `pandas` 版本 `1.1.3`，所有内容应当在 `Jupyter Notebook` 中执行以获得最佳效果。\n",
    "\n",
    "不同版本之间写法可能会有少许不同，如若碰到此情况，你应该学会如何自行检索解决。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5 - 数据筛选与修改\n",
    "\n",
    "\n",
    "<br>\n",
    "\n",
    "现在，终于来到 `pandas` 数据分析中最**高频**的操作，没有之一————数据的`增删改查`\n",
    "\n",
    "在后面的分组、聚合、透视、可视化等多个操作中，数据的筛选、修改操作也会不断出现。\n",
    "\n",
    "因此在本章节，我整理了 40 余个常用操作\n",
    "\n",
    "以习题的形式，带大家回顾利用 `pandas` 进行数据的增删改查。\n",
    "\n",
    "需要注意的是，**本节中的大部分习题都有两个、三个或者更多的解法**，但我仅提供一个思路。\n",
    "\n",
    "我提供的并非唯一也并非最优的方法，欢迎大家集思广益，加入 pandas 星球来提交你的解答并获得一定奖励。\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 初始化\n",
    "\n",
    "<br>\n",
    "\n",
    "该 `Notebook` 版本为**纯习题版**\n",
    "\n",
    "如果需要答案或者提示，可以微信搜索公众号「早起Python」获取！"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 加载数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 418,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_html(\"https://olympics.com/tokyo-2020/olympic-games/zh/results/all-sports/medal-standings.htm\")[0]\n",
    "\n",
    "# df = pd.read_csv(\"东京奥运会奖牌数据.csv\")  # 如果未联网可以使用这条命令加载数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5-1 数据修改\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1- <font color = '#FB8E00'>数据修改</font>｜列名\n",
    "\n",
    "<br>\n",
    "\n",
    "将原 `df` 列名 `Unnamed: 2`、`Unnamed: 3`、`Unnamed: 4` 修改为 `金牌数`、`银牌数`、`铜牌数`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 419,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2 - <font color = '#FB8E00'>数据修改</font>｜行索引\n",
    "\n",
    "<br>\n",
    "\n",
    "将第一列（排名）设置为索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 420,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3 - <font color = '#FB8E00'>数据修改</font>｜修改索引名\n",
    "\n",
    "<br>\n",
    "\n",
    "修改索引名为 金牌排名"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 421,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4 - <font color = '#FB8E00'>数据修改</font>｜修改值\n",
    "\n",
    "<br>\n",
    "\n",
    "将 ROC（第一列第五行）修改为 俄奥委会"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 422,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5 - <font color = '#FB8E00'>数据修改</font>｜替换值（单值）\n",
    "\n",
    "<br>\n",
    "\n",
    "将金牌数列的数字 `0` 替换为 `无`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 423,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6 - <font color = '#FB8E00'>数据修改</font>｜替换值（多值）\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "同时替换\n",
    "\n",
    "- 将 `无` 替换为 缺失值\n",
    "- 将 0 替换为 `None`\n",
    "\n",
    "注意：缺失值的 Nan 该怎么生成？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 424,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7 - 数据查看\n",
    "\n",
    "查看各列数据类型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 425,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "国家奥委会       object\n",
       "金牌数        float64\n",
       "银牌数         object\n",
       "铜牌数         object\n",
       "总分          object\n",
       "按总数排名       object\n",
       "国家奥委会代码     object\n",
       "dtype: object"
      ]
     },
     "execution_count": 425,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8 - <font color = '#FB8E00'>数据修改</font>｜修改类型\n",
    "\n",
    "将 `金牌数` 列类型修改为 `int`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 426,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 9 - <font color = '#1B85FF' >数据增加</font>｜新增列（固定值）\n",
    "\n",
    "<br>\n",
    "\n",
    "**重新加载数据** 并 **新增一列** 比赛地点，值为 `东京`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 427,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "微信搜索公众号「早起Python」，关注后可以获得更多资源！"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 10 - <font color = '#1B85FF' >数据增加</font>｜新增列（计算值）\n",
    "\n",
    "<br>\n",
    "\n",
    "**新增一列** 金银牌总数列，值为该国家金银牌总数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 428,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 11 - <font color = '#1B85FF' >数据增加</font>｜新增列（比较值）\n",
    "\n",
    "新增一列 最多奖牌数量 列，值为该过金银牌数量种最多的一个奖牌数量\n",
    "\n",
    "例如美国银牌最多，则为41，中国为38"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 429,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 12 - <font color = '#1B85FF' >数据增加</font>｜新增列（判断值）\n",
    "\n",
    "新增一列 金牌大于30\n",
    "\n",
    "如果一个国家的金牌数大于 30 则值为 是，反之为 否"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 430,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 13 - <font color = '#1B85FF' >数据增加</font>｜增加多列\n",
    "\n",
    "新增两列，分别是\n",
    "\n",
    "- 金铜牌总数（金牌数+铜牌数）\n",
    "- 银铜牌总数（银牌数+铜牌数）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 431,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 14 - <font color = '#1B85FF' >数据增加</font>｜新增列（引用变量）\n",
    "\n",
    "新增一列金牌占比，为各国金牌数除以总金牌数（gold_sum）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 432,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 433,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 15 - <font color = '#1B85FF' >数据增加</font>｜新增行（末尾追加）\n",
    "\n",
    "<br>\n",
    "\n",
    "在 df 末尾追加一行，内容为 0,1,2,3... 一直到 `df` 的列长度"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 434,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 16 - <font color = '#1B85FF' >数据增加</font>｜新增行（指定位置）\n",
    "\n",
    "<br>\n",
    "\n",
    "在第 2 行新增一行数据，即美国和中国之间。\n",
    "\n",
    "数据内容同上一题"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 435,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 17 - <font color ='#27BE49'>数据删除</font>｜删除行\n",
    "\n",
    "<br>\n",
    "\n",
    "删除 `df` 第一行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 18 - <font color ='#27BE49'>数据删除</font>｜删除行（条件）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 19 - <font color ='#27BE49'>数据删除</font>｜删除列\n",
    "\n",
    "<br>\n",
    "\n",
    "删除刚刚新增的 比赛地点 列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 438,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 20 - <font color ='#27BE49'>数据删除</font>｜删除列（按列号）\n",
    "\n",
    "<br>\n",
    "\n",
    "删除 `df` 的 `7、8、9、10` 列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 439,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5-2 数据筛选\n",
    "\n",
    "以下所有答案要求返回的是一个 dataframe 而不是 Series 这样可以直接存储为 Excel 等格式的文件！"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 21 - 重新加载数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_html(\"https://olympics.com/tokyo-2020/olympic-games/zh/results/all-sports/medal-standings.htm\")[0]\n",
    "\n",
    "# df = pd.read_csv(\"东京奥运会奖牌数据.csv\")  # 如果未联网可以使用这条命令加载数据\n",
    "\n",
    "df.rename(columns={'Unnamed: 2':'金牌数',\n",
    "                  'Unnamed: 3':'银牌数',\n",
    "                  'Unnamed: 4':'铜牌数'},inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### 22 - <font color = '#FB8E00'>筛选列</font>｜通过列号\n",
    "\n",
    "提取第 1、2、3、4 列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 23 - <font color = '#FB8E00'>筛选列</font>｜通过列名\n",
    "\n",
    "提取 `金牌数、银牌数、铜牌数` 三列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 24 - <font color = '#FB8E00'>筛选列</font>｜条件（列号）\n",
    "\n",
    "筛选全部 **奇数列**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 25 - <font color = '#FB8E00'>筛选列</font>｜条件（列名）\n",
    "\n",
    "提取全部列名中包含 数 的列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 26 - <font color = '#FB8E00'>筛选列</font>｜组合（行号+列名）\n",
    "\n",
    "提取倒数后三列的10-20行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 27 - <font color = '#1B85FF' >筛选行</font>｜通过行号\n",
    "\n",
    "提取第 10 行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 28 -  <font color = '#1B85FF' >筛选行</font>｜通过行号（多行）\n",
    "\n",
    "提取第 10 行之后的全部行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 29 - <font color = '#1B85FF' >筛选行</font>｜固定间隔\n",
    "\n",
    "提取 0-50 行，间隔为 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 30 -  <font color = '#1B85FF' >筛选行</font>｜判断（大于）\n",
    "\n",
    "提取 金牌数 大于 30 的行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 31 -  <font color = '#1B85FF' >筛选行 </font>｜判断（等于）\n",
    "\n",
    "提取 `金牌数` 等于 10 的行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 32 - <font color = '#1B85FF' >筛选行</font>｜判断（不等于）\n",
    "\n",
    "提取 `金牌数` 不等于 `10` 的行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 33 - <font color = '#1B85FF' >筛选行</font>｜条件（指定行号）\n",
    "\n",
    "提取全部 **奇数行**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 34 - <font color = '#1B85FF' >筛选行</font>｜条件（指定值）\n",
    "\n",
    "\n",
    "提取 中国、美国、英国 三行数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 35 -  <font color = '#1B85FF' >筛选行</font>｜多条件\n",
    "\n",
    "在上一题的条件下，新增一个条件：**金牌数小于30**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 36 - <font color = '#1B85FF' >筛选行</font>｜条件（包含指定值）\n",
    "\n",
    "提取 国家奥委会 列中，所有包含 `国`的行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 37 - 筛选某行某列\n",
    "\n",
    "提取 `第 0 行第 2 列`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 38 - 筛选多行多列\n",
    "\n",
    "\n",
    "提取 第 0-2 行第 0-2 列\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 39 - <font color ='#27BE49'>筛选值</font>｜组合（行号+列号）\n",
    "\n",
    "提取第 4 行，第 4 列的值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  40 - <font color ='#27BE49'>筛选值</font>｜组合（行号+列名）\n",
    "\n",
    "提取行索引为 4 ，列名为 金牌数 的值\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 41 - <font color ='#27BE49'>筛选值</font>｜条件\n",
    "\n",
    "提取 国家奥委会 为 中国 的金牌数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 42 - <font color ='#27BE49'>筛选值</font>｜query\n",
    "\n",
    "使用 query 提取 金牌数 + 银牌数 大于 15 的国家"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  43 - <font color ='#27BE49'>筛选值</font>｜query（引用变量）\n",
    "\n",
    "使用 query 提取 金牌数 大于 金牌均值的国家"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](http://liuzaoqi.oss-cn-beijing.aliyuncs.com/2021/09/16/16317972442543.jpg?域名/sample.jpg?x-oss-process=style/stylename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "367px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
